DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

نویسندگان

چکیده

Speech-driven gesture synthesis is a field of growing interest in virtual human creation. However, critical challenge the inherent intricate one-to-many mapping between speech and gestures. Previous studies have explored achieved significant progress with generative models. Notwithstanding, most synthetic gestures are still vastly less natural. This paper presents DiffMotion, novel speech-driven architecture based on diffusion The model comprises an autoregressive temporal encoder denoising probability Module. extracts context input historical module learns parameterized Markov chain to gradually convert simple distribution into complex generates according accompanied speech. Compared baselines, objective subjective evaluations confirm that our approach can produce natural diverse gesticulation demonstrate benefits diffusion-based models synthesis. Project page: https://github.com/zf223669/DiffMotion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-driven Speech Denoising Using Noise Profiles

This paper describes a targeted, undemanding data-driven signal processing approach to identify, control, and suppress a specific background noise which is present in a recording together with a spoken utterance. A background noise (like e.g. the sound of an engine onboard a bus) negatively influences the ASR system performance by distorting the speech signal spectrum. Thus it is necessary to p...

متن کامل

Combined Gesture-Speech Recognition and Synthesis Using Neural Networks

Sign languages such as Spanish Sign Language (LSE) are the primary communication way among members of the Deaf community. However this language is not widely known outside of this community. The techniques for automatic recognizing hand signs proposed in this paper allow creating systems which can help deaf people to communicate with others, by providing them with computer tools for assisted co...

متن کامل

Towards Natural Gesture Synthesis: Evaluating Gesture Units in a Data-Driven Approach to Gesture Synthesis

Virtual humans still lack naturalness in their nonverbal behaviour. We present a data-driven solution that moves towards a more natural synthesis of hand and arm gestures by recreating gestural behaviour in the style of a human performer. Our algorithm exploits the concept of gesture units to make the produced gestures a continuous flow of movement. We empirically validated the use of gesture u...

متن کامل

Combined Gesture-Speech Analysis and Synthesis

Multi-modal speech and speaker modelling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modelling of head, hand and...

متن کامل

Data Driven Gesture Model Acquisition using Minimum Description Length

An approach is presented to automatically segment and label a continuous observation sequence of hand gestures for a complete unsupervised model acquisition. The method is based on the assumption that gestures can be viewed as repetitive sequences of atomic components, similar to phonemes in speech, starting and ending in a rest position and governed by a high level structure controlling the te...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-27077-2_18